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replaced by p. I.e., pic, P.L.C. or PLC. 



Please turn over 


Re-registration under the Companies Act does not constitute a new legal entity 



PATENTS FORM No. 1/77 (Revised 1982) 
t'Ruiet 16,19) 

19/11/84 B3642 PAT*** lO'.OG 

The Comptroller tlQQ/f 

The Patent Office t£70 £ t 

25 Southampton Buildings $f> Gr Q Q$ Q> 

London, WC2A 1 AY Pee: £10,00 J ^ 7 * 

REQUEST FOR GRANT OF A PATENT 

THE GRANT OF A PATENT IS REQUESTED BY THE UNDERSIGNED ON THE BASIS OF THE PRESENT 
APPLICATION 

I Agent's Reference JJD/TEAF/26804 

M Title of Invention <^7FftfT^fi TOTATFn TO THE GENOMIC KNA OF 

Title ot invention -Ly^H^^yXTM^A^IMS VIHJS (IAV) AND PROTEINS 
ENCODED BY SAID IAV GENOMIC mA. 

I I i Applicant or Applicants (See note 2) 

institot pasteur 


Name (First or only applicant) 


Country ■ „ State M ADP Code No. 

Address 2572 8 ^ Roe du^ ^ • Boux t 

75724 Paris .Cjf^ex 15 ? France. 

Name {of second applicant, if more than one) finn mffi 1 National^ 1 1 1 

de la Beche rche Sclen^f^** Country State 

Address 

75007 Paris, Prance. 


IV 

Inventor (see note 3) 

or 

(b) A statement on Patents Form No. 7/77 is/will be furnished 

V 

Name of Agent (if any) (See note 4) 

Reddie & Grose 

ADP CODE NO 

VI 

Address for Service (See note 5) 

16 Theobalds Road 
London wcix 8 PL 



VI I Declaration of Priority (See note 6) 

Country Filing date File number 


VIII The Application claims an earlier date under Section 8(3), 12(6), 15 <5s 3^4} (S$ 
Section No 

Earlier application or patent number tnd filii 



IX Check List (To be filled in bv applicant or agent) 


A 

The application contains the 

B 


following number of sheet(s) 


1 


i 


1 

2 


« r 

. Sheet(s) 

2 

3 



. Sheet(s) 

3 

4 


-u 

. Sheet(s) 

4 

5 



, Sheet(s) 

5 


Translation of priority document 


Request for Search 


Statement of Inventorship and Right to Apply . 


X It is suggested that Figure No . 
abstract when published. 


.! of the drawings (if any) should accompany the 


XI Signature (See note 8) 


Reddie & Grose, Agents for the Applicant(s) 


>TES: 


8. 

"9. 


10. 


lerwr 


th the prescribed fee 


This form, when completed, should be brought or sent to the Patent Office together 
and two copies of the description of the invention, and of any drawings. 

Enter the name and address of each applicant. Names of individuals should be indicated in full and the 
surname or family name should be underlined. The names of all partners in a firm must be given in full. 
Bodies corporate should be designated by their corporate name and the country of incorporation and, where 
appropriate, the state of incorporation within that country should be entered where provided. Full 
corporate details, eg "a corporation organised and existing under the laws of the State of Delaware, United 
States of America," trading styles, eg "trading as xyz company", nationality, and former names, eg "formerly 
[known as] A8C Ltd." are QfiLrequired and should not be given. Also enter applicant (s) ADP Code No. (if known). 

Where the applicant or applicants is/are the sole inventor or the joint inventors, the declaration (a) to that 
effect at IV should be completed, and the alternative statement (b) deleted. If, however, this is not the case 
the declaration (a) should be struck out and a statement will then be required to be filed upon Patent 
Form No 7/77. 

If the applicant has appointed an agent to act on his behalf, the agent's name and the address of his place of 
business should be indicated in the spaces available at V and VI. Also insert agent's ADP Code No. (if known) 
in the box provided. 

An address for service in the United Kingdom to which all documents may be sent must be stated at VI. It 
is recommended that a telephone number be provided if an agent is not appointed. 

The declaration of priority at Vtl should state the date of the previous filing and the country in which it was 
made and indicate the file number, if available. 

When an application is made by virtue of section 8(3), 12(6), 15(4), or 37(4) the appropriate section should 
be identified at VIII and the number of the earlier application or any patent granted thereon identified. 

Attention is directed to rules 90 and 106 of the Patent Rules 1982. 

Attention of applicants is drawn to the desirability of avoiding publication of inventions relating to any 
article, material or device intended or adapted for use in war (Official Secrets Acts, 191 1 and 1920). tn 
addition after an application for a patent has been filed at the Patent Office the comptroller wilt consider 
whether publication or communication of the invention should be prohibited or restricted under section 22 
of the Act and will inform the applicant if such prohibition is necessary. 

Applicants resident in the United Kingdom are also reminded that, under the provisions of section 23 appli- 
cations may not be filed abroad without written permission or unless an application has been filed not less 
than six weeks previously in the United Kingdom for a patent for the same invention and no direction 
prohibiting publication or communication has been given or any such direction has been received. 
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Cloned ONA taquiflcftg related to ths genomic RNA of lvwohi- 

denPPfltfty-ailgCifttgfl-YirUi (LAY) and protiim encoded bv 

said LAV genomic RNA 

The invention relates to cloned DNA sequences 
5 indistinguishable from genomic RNA and DNA of lymphs- 
denopathy-associated virus (LAV), a process for their 
preparation and their uses. It relates more particularly 
to stable probes including a DNA sequence which can be 
used for the detection of the LAV virus or related viruses 
10 or DNA proviruses in any medium, particularly biological 
samples containing any of them. The invention also relates 
to polypeptides, whether glycosylated or not, encoded by 
said DNA sequences. 

Lymphadenopathy-associated virus (LAV) is a human 
15 retrovirus first isolated from the lymph node of a homo- 
sexual patient with lymphadenopathy syndrome* frequently a 
prodrome or a benign form of acquired immune deficiency 
syndrome (AIDS). Subsequently other LAV isolates have been 
recovered from patients with AIDS or pre-AIDS. All avail*- 
20 ble data are consistent with the virus being the causative 
agent of AIDS. 

A method for cloning such DNA sequences has alrea- 
dy been disclosed in Britiah Patent Application Nr. 
54 23659 filed on September 19, 1984. Reference is here- 
25 after made to that application as concerns subject matter 
in common with the further improvements to the invention 
disclosed herein. 

The present invention aims at providing additional 
new means which should not only also be useful for the 
30 detection of LAV or related viruses (hereafter more 
generally referred to as "LAV viruses" ), but also have 
more versatility, particularly in detecting specific parts 
of the genomic DNA of said viruses whose expression pro- 
ducts are not always directly detectable by immunological 
35 methods. 

The present invention further aims at providing 


2 

polypeptides containing sequences in common with polypep- 
tides encoded by the LAV genomic RNA. It relates even more 
particularly to polypeptides comprising antigenic deter- 
minants included in the proteins encoded and expressed by 
5 the LAV genome occuring in nature. An additional object of 
the invention is to further provide means for the 
detection of proteins related to LAV virus, particularly 
for the diagnosis of AIDS or pre-AXDS or. to the contrary* 
for the detection of antibodies against the LAV virus or 

to proteins related therewith. particularly in patients 
afflicted with AIDS or pre-AIOS or more generally in 
asymtomatic carriers and in blood-related products. 
Finally the invention also aims at providing immunogenic 
polypeptides. and more particularly protective 

15 polypeptides for use in the preparation of vaccine 
compositions against AIDS or related syndrome. 

The present invention relates to additional DNA 
fragments, hybridizable with the genomic RNA of LAV as 
they will be disclosed hereafter, as well as with additio- 

20 nal cDNA variants corresponding to the whole genomes of 
LAV viruses. It further relates to DNA recombinants con- 
taining said DNAs or cDNA fragments. 

The invention relates more particularly to a cONA 
variant corresponding to the whole of LAV retroviral 

25 genomes, which is characterized by a series of restriction 
sites in the order hereafter (from the 5* end to the 3' 
end ) . 

The coordinates of the successive sites of the 
whole LAV genome (restriction map) are indicated hereafter 
30 too, with respect to the Hind III site (selected as of 
coordinate 1) which is located in the R region. The 
coordinates are estimated with an accuracy of ± 200 bp ; : 


Hind III 0 

Sac I 50 

35 Hind III 520 

Pat I 800 

Hind III 1 tOO 


Bgl XX 1 500 

Kpn I 3 500 

Kpn I 3 900 

Eco RX * 100 

Eco IX 5 300 

s.i i s 500 

Kpn I 8 100 

Bgl II 8 500 

Bgl II 7 600 

Hind III T 850 

Bam HI 8 150 

Xho I 8 800 

Kpn I 8 700 

Bgl II 8 750 

Bgl II 9 150 

S.C I 9 200 

Anothl" DNA variant according to thi. invention 
optionally contains an additional Hind III approximately 

at the 5 550 coordinate. 

Reference i. further .ad. to fig. 1 »*ich .how. a 
fflor , detailed re.triction map of ..id whol.-ONA 1^19). 

An even more detailed nucleotidi. ..quenc. of » 
pref.rred DNA according to the invention i. .hown in fig. 

4-12 h»re»ft.»r. 

The invention further relate, to other preferred 

ONA fragment, which will be referred to hereafter 

Additional feature, of the invention wxll appear 
i„ the cour.e of the non-limitative di.clo.ure of additio- 
ns feature, of preferred ONA. of the invention, a. well 
a , of preferred polypeptide, according to the invention. 
Reference will further be had to the drawing, in which : 
! fig. 1 i. the re.triction M p of a complete LAV genome 

(clone XJ19) ; + * rmm 

fig.. 2 and 3 .how di.gramm.tic.lly part, of the three 


possible reading phases of LAV genomic RNA, including the 
open reading frames <ORF) apparent in each of said reading 
phases ; 

figs. 4-12 show the successive nucleotidic sequences of 
a complete LAV genome. The possible peptidie sequences in 
relation to the three possible reading phases related to 
the nucleotidie sequences shown are alao indicated t 

figs. 13-18 reiterate the sequence of part of the LAV 
genome containing the genes coding for the enveloppe pro- 
teins, with particular boxed peptidie sequences which cor- 
responds to groups which normally carry glycosyl groups* 

The sequencing and determination of sites of par- 
ticular interest was carried out on a phage recombinant 
corresponding to AJ19 disclosed in the abovesaid British 
Patent application Nr. 64 23659. A method for preparing it 
is disclosed in that application. 

The whole recombinant phage DHA of clone XJ19 
(disclosed in the earlier application) was sonicated 
according to the protocol of DEININ6ER (1963), Analytical 
Biochem. 129. 216. the ONA was repaired by a Klenow 
reaction for 12 hours at 16 a C. The DNA was electrophoresed 
through 0.6 X agarose gel and DNA in the size range of 
300-600 bp was cut out and electroeluted and precipitated. 
Resuspended ONA (in 10 mM Tris. pH 6 ; 0J mH EOTA) was 
ligated into Ml3mp8 ftp DNA (cut by the restriction eo2yme 
Smal and subsequently alkaline phosphated) , using Tt DNA- 
and RNA-ligases (Maniatis T et al (1962) - Molecular 
cloning - Cold Spring Harbor Laboratory). An £. CQlA 
strain designated as TGI was used for further study. This 
strain has the following genotype : 

Alac pro. supE. thi . F * traD36 , proAB , lad* 1 , ZAM13,r~ 

This coli TGI strain has the peculiarity of 

enabling recombinants to be recognized easily. The blue 
colour of the cells transfected with plasmids which did 


not fecombine with a fragment of LAV ONA is not modified. 
To the contrary calls transacted by a recombinant plasmid 
containing a LAV ONA fragment yield white colonies. The 
technique which was used is disclosed in Gene (1983), 2 B . 
101 . 

This strain was transformed with the ligation mix 
using the Hanahan method (Hanahan 0 (1963) J. Hoi. Biol. 
168. 557). Cells were plated out on tryptone-agaroae plate 
with IPTG and X-gal in soft agarose. White plaques were 
either picked and screened or screened directly in situ 
using nitrocellulose filters. Their DMAs were hybridized 
with nick-translated ONA inserts of pUCI* Hind III 
subclones of AJ19. this permitted the isolation of the 
plasmids or subclones of A which are identified in the 
table hereafter. In relation to this table it should also 
be noted that the designation of each plasmid is followed 
by the deposition number of a cell culture of £. eoj.i TGI 
containing the corresponding plasmid at the "Collection 
Nationale des Cultures de Micro-organismes* (C.N. CM.) of 
the Pasteur Institute in Paris , France. A non- transformed 
TGI cell line waa also deposited at the C.N. CM. under Nr, 
1*364. All these deposits took place on November 15, 1984. 
The sizes of the corresponding inserts derived from the 
LAV genome have also been indicated. 


J ABLE 

..•ntial featurea of tha recombinant plaamida 


- pJ19-13 plaamid 

Hind III (5* ) 

8gl II 

Kpn I 

Kpn I 

Eco RI 

Eco RI 

Sal I 

Kpn I 

89I II 

Bgl IX 

Hind III (3') 


O.S kt> 


P J19 - 1 Pl.amid d-365) 0.5 kb 

Hind III - Sac I - Hind III 

P J19 - 17 plaamid (1-367) 

Hind III - Pat 1 - Hind- III 

pJ19 - 6 plaamid (1-366) 1-» kto 

Hind III (5' ) 

Bam HI 

Xho I 

Kpn I 

Bgl II 

Sac I (3* ) 

Hind III 


tI-368) 6- 7 kb 


10 


15 


23 


30 


Po.itively hybridising H13 phage pl.t.. war. gro*n 
up for * hour, .nd th. .ingl.-.tr.nd.d DNA. w.r. 

.xtr.ct.d.^ gubclon# . of XJ19 DNA. «•*• ..qu.nc.d 
according to th. did.oxy method .nd technology d.vi..d by 
S.ng.r et .1 CS.ng.r .t .1 (1977) » Proc. M.tX. *«d Sci 
USA 1± 5483 .nd M13 cloning .nd ..qu.ncing handbook. 
AMERSHAM '(1S-3,. th. oligonucl.ot id. prim.r 

a - 3S SdATP C400Ci/mmol. AMERSHAM) , .nd 0.5X-5X buff.r 
gr .dt.-t ..l. <Bigg.n H.o. .t .1 Cit«. Fr.c. » M ' A ^ 
sci. USA. SSL. 3 963) *.r. u.ed. G.X. r..d £ 

th. co»put.r under th. progr.m. of St.d.n t""" 

huci. Acid. -•••■^* wi -^^*n^ 1 ^ 

rB f.r.nc. .nd m.thod. e.n b. found in th. AMERSHAM M,3 
cloning .nd ..qu.ncing h.ndbook. 

Th. complete ..qu.nc. of AJ19 d.duc.d from th. 

experiment. .. further disclo.ed her.«ffr. 

Fig.. 4-12 provide the DNA nucleotide. ..qu.nc. of 

• iy Th* numb. ring of th. 
the complete genome of LAV. me 

« li..t M .. f»- • »« — « 111 " ,tr ' ct T. 

, it . ,.-AA... . " th. «.t»t-t • 

OK ur, in t.n. -h.r.by th. !..« x.r. nu-b.r .« ^ «. 

„ umB .r. occurin. «. th. dr.wlh,. 1. i«« -*- «"• 

„ U cl.otid. c.rr..po n din B t. th. nucl.otid.. -..ion..^ 
th. nud.otld. .t po.iti.h 1. i. T. th. h«cl.otld. .t 

position 20 is C, »tc. 

Above each of the lines of the successive nucleo- 
tide sequence. there are provided three lines of single 

Ltt.r. eorr..po„din, to th. .»ih..cid ..qu.nc «d 

from the DNA sequence < using the genetic code) for each at 

M . a 0 K» ae s whereby said single letters have 
the three reading phases . wn 

the following meanings. 

alanine 

srginine 

lysine 

histidine 

cysteine 


A 

R 

35 K 
H 
C 


I 


a 


10 


15 


20 


25 


30 


35 


M 
W 
F 
Y 
L 
V 
I 
G 
T 
S 
E 
0 
N 
0 
P 
The 


methionine 

tryptophan 

phenylalanine 

tyrosine 

leucine 

valine 

isoleucine 

glycine 

threonine 

serine 

glutamic acid 
Aspartic acid 
asparagine 
glutamine 
proline, 
aaterik signs 


*" correspond to stop codons 


(i.e. TAA, TAG and TGA) . 

Starting above the first line of the DNA 
nucleotidic sequence of fig. 4 the three reading phases 
are respectively marked ■ 1 " t m 2 m . *3 W , on the left 
handside of the drawing. The same relative presentation of 
the three theoretical reading phases is then used all over 
the successive* lines of the LAV nucleotidic sequence. 

Figs* 2 and 3 provide a diagrammatized represen- 
tation of the lengths of the successive open reading 
frames corresponding to the successive reading phases 
(also referred to by numbers "1", "2* and "3" appearing in 
the left handside part of fig. 2). The relative positions 
of these open reading frames ( ORF ) with respect to the 
nucleotidic structure of the LAV genome is referred to by 
the scale of numbers representative of the respective 
positions of the corresponding nucleotides in the DNA 
sequence. The vertical bars correspond to the positions of 
the corresponding stop codons* 
1 ) The "gag oene" (or QPF-qaq) 

The "gag gene" codes for core proteins. 


10 


15 


20 


25 


30 


35 


Particularly it appear* that a genomic fragment (OftF-gag) 
thought to coda for tha cort antigens including the p25, 
p1B and pi 3 proteins ia located between nucleotidic 
position 236 (starting with 5* CTA 6CG GAG 3') and 
nucleotidic position 1759 (ending by CTCG TCA CAA 3'). The 
structure of the peptides or proteins encoded by parts of 
said ORF is deemed to bo that corresponding to phase 2. 

Tha methionine ami no acid "M" coded by tha AT6 at 
position 280-262 is the probable initiation methionine of 
the gag protein precursor. The end of ORF-gag and 
accordingly of gag protein appears to bo located at 
position 17S9. 

The beginning of p2S protein, thought to start by 

a P-I-V-Q-N-I-O-G-Q-M-V-H aminoacid sequence is 

thought to be coded for by the nucleotidic sequence 
CCTATA. . . 4 starting at position 655, 

Hydrophilic peptides in the gag open reading frame 
are identified hereafter. They are defined starting from 
aminoacid 1 « Met (H) coded by the AT6 starting from 260-2 
in the LAV 0NA sequence. 

Those hydrophilic peptides are 


12*32 aminoacida inclu 

37-46 

49-79 

88-153 
158-165 
178-188 
200-220 
226-234 
239*264 
268-331 
352-361 
377-390 
399-432 
437-484 
492-498 


xve 
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The invention alto relatae to any combination of 
these peptides* 

2) Tha w po1 oane« (or ORF-oolt 

Figa. 4-12 also show that tha ONA fragments 
extending from nudeotidic position 155S (starting with 
5'TTT TTT . ...3 l to nucleotidic position 5066 is thought 
to correspond to the pol gene. The polypeptidic structure 
of the corresponding polypeptides is deemed to be that 
corresponding to phase 1. It stops at position 4583 (end 
by 5'G GAT GAG GAT 3* ) . 

These genes are thought to code for the virus 
polymerase or reverse transcriptase. 

3) The envelope oena for ORF-envl 

The ONA sequence thought to code for envelope 
proteins is thought to extend from nucleotidic position 
5670 (starting with 5' AAA GAG GAG A 3*) up to nucieo- 
tidic position B132 (ending by ...♦A ACT AAA GAA 3'). 
Polypeptidic structures of sequences of the envelope 
protein correspond to those read according to the 'phase 
3 n reading phase. 

The start of env transcription is thought to be at 
the level of th ATG codon at positions 5691-5693. 

Additional feature of the envelope protein coded 
by the env genes appear on figs. 13-16. These are to be 
considered as paired figs. 13 and 14 ; 15 and 16 ; 17 and 
16 respectively. 

It is to be mentioned that because of format 
difficulties . 

Fig. 14 overlaps to some extent with fig. 13. 

Pig. 16 overlaps to some extent with fig. 13. 

Fig. 18 overlaps to some extent with fig. 17. 

Thus for instance figs. 13 and 14 must be con- 
sidered together. Particularly the sequence shown* on the 
first line on the top of fig. 13 overlaps with the 
sequence shown on the first line on the top of fig, 14. In 
other words the 'starting of the reading of the successive 


sequence* of the env gent 11 rtpr^ianted in figs. 13-18 
involves first reading the first lint at the top of fig. 
13 then proceeding further with the first line of fig. 14. 
One then returns to the beginning of the second line of 
5 fig* 13 . then again further proceed with the reading of 
the seeond line of page 14, etc*.. The same observations 
then apply to the reading of the paired figs. 13 and 16, 
and paired figs. 1? and 18, respectively. 

The locations of neutralizing epitopes are further 

10 apparent in figs. 13-18* reference is more particularly 
made to the boxed groups of three letters included in the 
aminoacid sequences of the envelope proteins {reading 
phase 3) which can be designated generally by the formula 
N-X-S or N-X-T , wherein X is any other possible aminoacid. 

15 Thus the initial protein product of the env gene in a gly- 
coprotein of molecular weight in excess of 91,000. These 
groups are deemed to generally carry glycosylated groups. 
These N-X-S and N-X-T groups with attached glycosylated 
groups form together hydrophylic regions of the protein 

20 and are deemed to be located at the periphery of and to be 
exposed outwardly with respect to the normal conformation 
of the proteins. Consequently they are considered as being 
epitopes which can efficiently be brought into play in 
vaccine compositions . 

25 The invention thus concerns with more particulari- 

ty peptide sequences included in the env-proteins and 
excizable therefrom (or having the same aminoacid struc- 
ture), having sizes not exceeding 200 aminoacid*. 

Preferred peptides of this invention (referred to 

30 hereafter as a, b, c, d, e, f) are deemed to correspond to 
those encoded by the nucleotide sequences which extend 
respectively between the following positions : 

a) from about 8095 to about 8200' 

b) 6260 • " 6310 ^ 
35 C) - - 6390 - - 6440 " 

d) " * 6465 * - 6620 ^ 


1 2 


„ r - • 6860 " " 6930^ 

f) - - 7535 " ' 7830 ^' 

Other hydrophilic peptides in th. .nv open reading 
frame .re identified h.re.ft.r. they are defined starting 

from 

aminoacid 1 - ly.in. U> coded by the AAA at position 
5670-2 in the LAV DNA »equenee. 

These hydrophilic peptides *re 
8-23 aminoacide inclusive 

63-78 

82-90 

«»7-1?3 
127-183 
197-201 
239-294 
300-327 
334-381 
397-424 
488-500 
510-523 
551-577 
594-603 

621-630 

657-679 

719-758 

760-603 

The invention 
these peptides. 

♦ ) Thft fft.her PRF ^. ^ 

The invention further concerns DNA sequences which 

provide open reeding frames defined a. 0RF-0. ORF-R and a, 
.,- - 4 - -5-, the relative position of which 
appears more particularly in figs. 2 and 3. 

These ORFs have the following locations : 
Orf-Q Phase 1 start 4478 stop 5086 

ORF-R ' 2 " 8249 " 8896 


also relate* to any combination of 


13 


ORF-1 - 1 * 5029 " 5318 

ORF-2 - 2 " 5273 - 5515 

ORF-3 * 1 " 5383 " 5616 

ORF-4 2 - 5519 * 5773 

ORF-5 " 1 - 7965 * 8279 


The LTR (long terminal repeats) can be defined as 
lying between position 8560 and position 160 (end exten- 
ding over position 9097/1). As a matter of fact the end of 
the genome is at 9097 and, because of the LTR structure of 
the retrovirus, links up with the beginning of the 
sequence : 

Hind 1 11^ 
CTCAATAAAGCTT6CCTTG 

n 

9097 1 

The invention concerns more particularly all the 
ONA fragments which have been more specifically referred 
to hereabove and which correspond to open reading frames* 
It will be understood that the man skilled in the art will 
be able to obtain them all f for instance by cleaving an 
entire ONA corresponding to the complete genome of a LAV 
species, such as by cleavage by a partial or complete 
digestion thereof with a suitable restriction enzyme and 
by the subsequent recovery of the relevant fragments. The 
different ONAs disclosed in the earlier mentioned British 
Application can be resorted to also as a source of sui- 
table fragments. The techniques disclosed hereabove for 
the isolation of the fragments which were then included in 
the plasmids TBf^xr^fi to hereabove and which were then 
used for the ONA sequencing can be used. 

Of course other methods can be used* Some of them 
have been exemplified in the earlier British Application, 
reference is for instance made to the following methods. 

a) DNA can be transfected into mammalian cells 
with appropriate selection markers by a variety of tec- 
t\niquea t calcium phosphate precipitation > polyethylene 


1* 

glycol, protoplast-fusion, itc, 

b) DNA fragments corresponding to genes can be 
cloned into expression vectors for £. coll , yeast* or 
mammalian cells and the resultant proteins purified. 

c) The provival DNA can be "shot-gunned" (frag- 
mented) into procaryotic expression vectors to generate 
fusion polypeptides* Recombinant producing antigenically 
competent fusion proteins can be identified by simply 
screening the recombinants with antibodies against LAV 
antigens . 

The invention also relates more specifically to 
cloned probes which can be made starting from any ONA 
fragment according to this invention, thus to recombinant 
ONAs containing such fragment!, particularly any plasmids 
amplifiable in procaryotic or eucaryotic cells and carry- 
ing said fragments. 

Using the cloned DNA fragments as a molecular hy- 
bridization probe - either by marking with radionucleo- 
tides or with fluorescent reagents - LAV virion RNA may be 
detected directly in the blood, body fluids and blood 
products (e.g. of the antihemophylic factors such as 
Factor VIII concentrates) and vaccines, i.e. hepatitis B 
vaccine. It has already been shown that whole virus can be 
detected in culture supernatants of LAV producing cells. A 
suitable method for achieving that detection comprises 
immobilizing virus onto said a support e.g. nitrocellulose 
filters, etc., disrupting the virion and hybridizing with 
labelled ( radiolabelled or "cold" fluorescent* or 
enzyme-labelled) probes* Such an approach has already been 
developed for Hepatitis 0 virus in peripheral blood 
(according to SCOTTO 3 . et al. Hepatology (19*3), 3, 
379-384) . 

Prnh*« mr.r.nrfii no to the invention Ctn SlSQ &* used 
for rapid screening of genomic ONA derived from the tissue 
of patients with LAV related symptoms, to see if the pro- 
viral ONA or RNA is present in host tissue and other 
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tissues* 

A method which can be used for such screening 
comprise the following stops i extraction of DMA from tis- 
sue, restriction enzyme cleavage of said 0NA t electro- 
5 phoresis of the fragments and Southern blotting of genomic 
ONA from tissues, subsequent hybridization with labelled 
cloned LAV provival ONA ♦ Hybridization ±n aJLIu can also be 
used. 

Lymphatic fluids and tissues and other non-lympba- 
10 tic tissues of humans, primates and other mammalian 
species can also be screened to see if other evolutionnary 
related retrovirus exist. The methods referred to here* 
above can be used. although hybridization and washings 
would be done under non stringent conditions. 
15 The ONA according to the invention can be used 

also for achieving the expression of LAV viral antigens 
for diagnostic purposes. 

The invention also relates to the polypeptides 
themselves which can be expressed by the different ONAs of 
20 the inventions. particularly by the ORFs or fragments 
thereof , in appropriate hosts, particularly procaryotic or 
eucaryotic hosts, after transformation thereof with a 
suitable vector previously modified by the corresponding 
ONAs . 

25 These polypeptides can be used as diagnostic 

tools, particularly for the detection of antibodies in 
biological media, particularly in sera or tissues of 
persons afflicted with pra-AIOS or AIOS. or simply 
carrying antibodies in the absence of any apparent 

30 disorders. Conversely the different peptides according to 
this invention can be used themselves for the production 
of antibodies, preferably monoclonal antibodies specific 
of the different peptides respectively* For the production 
of hybridomas secreting said monoclonal antibodies 

35 conventional production and screening methods are used* 
These monoclonal antibodies, which themselves are part of 
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the invention than provide very useful tools for the 
identification and even determination of relative 
proportions of the different polypeptides or proteins in 
biological samples, particularly human samples containing 
LAV or related viruses. 

Thus all of the above peptides can be used in 
diagnostics as sources of immunogens or antigens free of 
viral particles, produced using non-permissive systems, 
and thus of little or no biohazard risk* 

The invention further relates to the hosts Iproca- 
ryotic or eucaryotic cells) which are transformed by the 
above mentioned recombinants and which are capable of 
expressing said ONA fragments. 

Finally it also relates to vaccine compoaitions 
whose active principle is to be constituted by any of the 
expressed antigens, i.e. whole antigens, fusion polypep- 
tides or oligopeptides in association with a suitable 
pharmaceutical or physiologically acceptable carrier. 

Preferably the active principles to be considered 
in that field consist of the peptides containing less than 
250 aminoscid units, preferably less than 150 as deducible 
for the complete genomas of LAV . and even more preferably 
those peptides which contain one or more groups selected 
from N-X-S and N-X-T as defined above. Preferred peptides 
for use in the production of vaccinating principles are 
peptides (a) to if) as defined above. By way of example 
having no limitative character, there may be mentioned 
that suitable dosages of the vaccine compositions are 
those which enable administration to the host, 
particularly human host ranging from 10 to 500 micrograms 
per kg, for instance 50 to 100 micrograms per kg* 

For the purpose of clarity figs. 19 to 26 are 
added* reference may be made thereto in case of difficul- 
ties of reading blurred parts of figs. 4 to 12. 
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Needless to say that figs. 19-26 an merely a 
reiteration of the whole ONA sequence of the LAV genorna. 

Finally the invention also concerns vectors for 
the transformation of eucaryotic cells of human origin, 
particularly lymphocytes, the polymerases of which are 
capable of recognizing the LTRs of LAV . Particularly said 
vectors are characterized by the presence of a LAV LTR 
therein, said LTR being then active as a promoter enabling 
the efficient transcription and translation in a suitable 
host of the above defined, of a ONA insert coding for a 
determined protein placed under its controls. 

Needless to say that the invention extends to all 
variants of genomes and corresponding ONA fragments (ORFs) 
having substantially equivalent properties, all of said 
genomes belonging to retroviruses which can be considered 
as equivalents of LAV. 


CLAIMS 

1 » A ONA fragment of LAV extending from nucleotide 
position 236 to nucleotide position 1?59. 

2. A ONA tTigmmnt of LAV extending from nucleotide 
position 1555 to nucleotide position 5066. 

3. A ONA fragment of LAV extending from nucleotide 
position 5670 to nucleotide position 8132. 

4. A vector containing a ONA fragment according to 
any of claims 1 to 3. 

5. Peptide corresponding to any of those encoded 
by the nucleotide sequences which extend respectively 
between the following positions ; 

a) from about 6095 to about 6200 

b) * • 6260 " * 6310 
C) " * 6390 " * 6440 

d) 5485 " * 6620 

e) • • 6660 " " 6930 

f ) " - 7535 " m 7630 - 

6* Peptide characterized by a sequence of amino- 
acids deducible from LAV ONA the terminal aminoacids of 
which extend between the following positions with respect 
to the lysine (position 1) coded by the AAA at position 
5870-5672 in the LAV DNA • 

6*23 aminoacids inclusive 
63-78 
82-90 
97-123 * 
127-163 " 
197-201 
239-294 " 
300-327 
334-381 
397-424 M 
466-500 * 
510-523 " 
551-577 m 
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10 


994-603 
621-630 
657-679 
719-750 
760-003 

or any combination of these peptides, 

7. Peptide corresponding to the aminoacid 
sdqu«nct$ deducible from LAV DNA and the terminal 
aminoacids of which art positionned at th» positions 
hereafter counted from the Het at position 1 coded by the 
ATG sequence at nucleotide positions 260-2 : 

live 
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20 


25 


12- 

32 

37- 

4S 

49- 

79 

88- 

133 

isa- 

16S 

178- 

1 88 

200- 

220 

226- 

234 

239- 

264 

288- 

331 

332- 

361 

377- 

390 

399- 

432 

437- 

484 

492- 

498 
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and combination of said peptides. 

8, Diagnostic means containing any of the DMA 
fragments of any of claims 1 to 3. 

9. Diagnostic means containing any of the peptides 
of any of claims 4 to 6. 

10* Vaccine compositions containing any of the 
peptides according to any of claims 4 to 6 in association 
with a pharmaceutical vehicle. 


35 
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C. -C AGaOGaCAGCAaC AAA rCuAoLC AGTACATCCT ACACTAGAGCCCTGCAaGCATCCAGGAACTCAGCCTAa. 
5290 S300 5310 5320 ">330 5340 ^3 1>0 


( p $ L F H N K S L * M L L WOE t A E T A T K T S 
OVCFTT K A L C I SYCRKKRRORRRPP 
lc'FVS0OKP*4SP.1A&RSG0S0E0Li 
CC AAGTTTGTTTCACAACAAAAGCCTTAGCCATCTCCTATGGCAGGAAGAAGCGCAGACAGCGACGAAGACCTCC* 
5410 5420 5430 5440 5450 5460 5470 


STCNAT YTnSNSS I SSSNNMSNSCV 
VHV.HOP IQIAIAALVVAIIIAIVV W 
Y1*CNLYK*0*0H***0**#0*. LCG 
AGTACATGTAATCCAACCT ATACA AAT AGCAATAGCAGC ATTAGT AGT AGCA AT A AT A AT ACC AAT AGTT GT GTGC 
5530 5540 5550 5560 5570 5580 5590 


*i-' , JVN**TNRKS,RROWO * E * R R N I S 
IDkLIDRLIERAEDSGNESEGEISA 
♦ T G ♦ L I 0**KE OKTVAMRVKEKYUV 
AaTAGACAG<.TTAATTGATAGACTAATAGAAAGAGCAGAAGACAGTGCCAATGAGACTCAAGCACAAATATCAGC/ 
5650 5660 5670 5630 5690 5700 5710 


^♦♦SVVLOKNCGSOSIMGYLCGRKQ 
^ I 0 D L ♦ C YRKtVGHSLL HGTCVEGSN 
L « I C\ S \ A T E ' K L W V T V Y Y G V P V U K E A T 

TATTGATGATCTGTAGTGCT acacaaaaattgtgggtcacactctattatggcgtacctgtgtggaaggaagcaac 

5770 57*0 5790 5600 5310 5«2C 5830 


KYI ^FGPHrtPVY.PCTPTHKK«Y*** 

gt*clghtclcthrpoptrss ig y c_ 

V H N V m A T . H AC VP.TO .PN P OE V V * L V (n V 
AGGTAC AT AATGTTTCGGCC AC ACATGCCTGTCTACCCACAGACCCCAACCCAC AAGAAGTAGTATTGGTAAATGT 
5670 5300 5910 5920 5930 5940 5950 


C 1 R I ♦ S V Y G \ K A * S H V ~ N * P "M™ S " " V " L V 
A«GYNaFKGSKPKA*CKINPTLC*F 
MEOI ISL .100S.LKPC.VKLTPLCVSL 
TGCATGAGCATAr AATCAGTTT A T G GG ATC A AAG CC T AA ACCC A TG TC TA A A A T T A AC C C C AC TCT CTG T T A GTTT 
6010 6020 6030 6040 6050 6060 6070 

iriVVAGK* + WRKEK*KTALSISA3 
Y 0 » » ♦ k GMDDGERROK K : L L F 0 Y Q H K 
T h S Sj S G E .1 .1 n t: K G E I K N C Sj F Is 1 Si T r . 
ATACC AATAGTAGTAGCGGGGAAATGATGATGGACAAAGGAGAC AT A AAA AACTGCTCTTTCA A T A TC AGC A C A AG 
6130 6140 6150 6160 6170 6160 6190. 


LI*YQ*IMILPAI*»UVVTP0SLHR 

* Y H T fl R » Y YOLYV JKU » H L S H Y T G 

D I I P I 0 /N 0 H TS YTLTSC fN T S| V 'I T 0 A 
f T C A T A T A \ f A C C A A T \ G A T A A T C A T A C T AC C A G C T AT ACCT TCAC A AG T TG T A AC AC C T CaGTCA TT ACaCauCL 
6250 6260 6270 6?*0 '*?90 6100 '.310 

1 - ' • 

P ■* L Y L * F • V 1 IR^S^EOOMVOttSA 


I 


C AG A AG TCAGCC T A A A AC TGCT TG TACC AC TTCC TAT TGT AAA A AG TG TTGC TTTCATTG 
o 5350 5360 5370 5380 5300 5*00 


ATKT-SSPOSDS-SSFSIKAVS 
URRRPPOGSCTHCVSLSKO*V 

SOEOLLKAVRLIKFLYOSSK* 
A GCG ACC AAG ACC TCC TC A AGGC AG TC AG AC TC AT CA AC TTTCTCTATC A A ACCAGTAAGT 

0 5470 5480 5*90 5500 5510 5520 

SMSCVVHSNHRI *ENIKTKK 

1 A I V VWS IV I I EYRK I L R 0 ft K 
_*0*LCGP**S*NlCKY#OKEK 
T ACCAATAGTTGTGTGGTCC ATAGTAATCAT ACAATATAGG AAAATATT AAGACA AAGAA A 

i 5590 5600 5610 5620 5630 5640 


rrnistcgogggngapcslg 
ge isalvemgvemchhapwo 
• k e k y 0 h l w r w g w k w g t m l l c i 
aaggagaaatatcagcacttgtggagatgggggtggaaaYggggcaccatgctccttcgga 

; 5710 5720 5730 5740 5750 5760 

(fe GFK0PPLYFVH0HLKHMIO 

VEGSNHHS ilci*c*si*yp 

VrfKEAT TTLFCASDAXAYDTE 

♦tctcgaagcaaccaaccaccactctattttgtccatcacatgctaaaccatatgatacag 

5330 5840 5850 5*60 5870 5880 

*Y« ***OKILTCGK*TW*NR 
. * I G^ K C 0 RKF*HVEK*HGRTO 
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rTAGTATTCGTAAATGTGACAGAAAATTTTAACATGT GG AA AA ATG AC A TG GT AG AACA GA 
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°LCV5LKCTDLG FN A Tj N T' >N S Si N 
CACTCTCTGTTAGTTTAAAGTGCACTGATTTGGGCAAI I AU' AATACC AATAGTAGTA 
i _ 6070 6030 6090 6100 6110 6120 


5 I SAOA*EVRCPKNMHFFIM 
Q Y Q HKHKR*GAER I C I F L * T 
C f N * SJ TSIRGKVGKEYAFFYKl 
TCA ATATC AGCACAAGCATA AGAGGT A ACGTCC AG A A AG A AT A TGCAT TTTTTT AT A A AC 
6190 • 6200 6210 6??U 6Z30 6240 


OSLMRPVORYPLSOFPYIIV 
-SHYTGLSKG IL*ANSHTLLC 
s}v *ITOACPKVSFEP IPIHYCA 
OGTCATT ACACAGGCCTGTCC A A aGGT ATCCT TTC AGCC A ATT C CC AT AC ATT ATT GT G 
'•310 6120 6330 VJ40 6350 6360 


NVH*-fLGG*YOL* 


< 


PC^FCnss * * | ♦ " > v w » 'i r t * r * r ; 
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CCCCCGCTCCTTTTCCOATTCT AAAATCTA ATA/. TAAOACCTTCAATCGAACAGGACC ATCTACAA/-Tr,TC ACC 
6370 63*i0 639Q 6400 6410 6420 6*30 
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L I fa G St LAEEEVVIRSA H F Tl f) N A K T 
TGCTGTTGAATGGCAGTCT AGCAGAAG A AGAGGTAG TAATTAGA TCTGCC AATTTCAC AG AC A ATGCT AAAACC i 
6490 6500 6510 6520 6530 6540 6550 
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N //» M H RKStRlORGPGRAFvriGKI 
* CC AACA ACAAT AC A AG AAA A AG T ATC CGT ATCC AGAGGGCACC AGGCAGAGC ATTTGTT AC A AT AG G A AAAATAC 
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A TGC Ca'CTTTA AAAC AG AT AGCT AGC A AATT AA G AGAACA ATTTGGAAAT AATAAAACAATAATCTTTAAGCAAT 
^ 6730 6740 6750 6760 6770 6780 6790 
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K G I F L L « F _N T T V » » Y L V » » Y L FY* 1 ? 
G E F F Y C /N S Tl 0 L F fN S Tl W F /N S Tl K S T E 
GA GGGG AATTTTTCT ACTGT AATTCA AC ACAACTGjTTTA AT AGT AC TTGC t T T TAATAG TACTTCGjAGTACTGAAC 
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IKOFI NHHOEVGKAWYAP PISGOI: 
GA A T A A AACA ATTT AT AAAC ATGTGGC AGG A AG T AGG A A A AGCA ATGT AT GCCCCTCCC ATCAGCGGACAAATTA; 
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A N N N fN G S] EIFRPGGGDMRDNWRSEL 
^^TAAT A AC AACAATfiCCrCt GAGATCT TCAGACCTGGAGG AGGAGATATC AGGGAC A ATTGGAGAAGTG AATTAT 

7090 7100 7110 7120 7130 7140 7150 


P rqre'ewcrekkeqwe* elcsl*gsw 
qgkeksgaeftkkssgnrsfvpmvlc 
k akrry vqre k r a v g i gal fl gf l 
ccaaggcaaacagaagagtgctgcacacagaaaaaagagcagtgggaataccagctttgttccttgggttcttgc 

7210 7220 7230 7240 7250 7260 7270 
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tacaggccagacaattattgtctgctatagtccacc agcagaacaatttgctcagggctattgaggcgcaacacj 
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. Ij .0 ? C T pJ V Sj TVOC^Hr, rR°VVSTuL 

..\AC AGGACC ATCT ACAA/.TGTC AGC AC ACT AC A ATOT AC AC AT CCA ATT ACGCC AG T AC T A TC A AC T C A AC 
*U 6420 6*10 6440 64*0 ~ 6460 64 70 64*0 

P1S0T MLKP**YS*TNL*KLIV00 
Q F H »0C*NHMSTA E P T CRN ♦ L Y K T 
• |»i £ T> O NAKTI rYOL 0 Si V E I /N C T^ R P 

:C A A TTTCACACACAATGCTAAAACC AT AA TACT AC ACC TC A ACC A AT CT CT AC A A ATT A A T TG TA C A AC AC 
•0 6540 6550 6560 6570 65H0 * 6590 6600 

fHLL*Q*EK*EI •OKHIVTLVFONG 
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afyttck igm^roahc In i s| r a k w f 5T 

,AGC ATTTGTTACAATAGGA AAAATAGGAAAT AT GAG AC A AGC AC A TTC T A AC A T T AG T AG ACC A A AA TGC A 
6660 6670 6630 6690 6 700 6710 6720 
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.T AATAAAACAATAATCTTT A AGC AA TCCTC A GG AG GGG ACCC AG A A A TTGT A ACGCAC AGTTTT A ATTCTG 
3 6780 6790 6800 6810 *>820 6810 6»40 

GLIVLGVLKGOITLKEVTOSHSHA 
V » » Y L F Y * R V K ♦ M *RK*H\JHTP.1C 
P /N S T 1 S T E G S jN N T j EGSOTITLPC* 

OTTTAATAGTACTTCGAGT ACTGAAGGGTCA AATA ACACTGAAGGAAGTGAC ACAATCACACTCCCATGCA 
') 6900 9 6910 6920 6930 6?40 6950 6960 

m- PLPSAOKLOVHOILQG CY^OEnV 
_C P S H 0 R T M ♦ ii F I K Y Y RAAlNKR WW 

APPISGOIRCSS ]N I Tj GLLLTROGC 
T^CCCCTCCC ATCAGCGCACAAATTAGATGT TC ATCA A AT ATT AC AGGGCTGCTATT AAC A AG AG ATGGTG 
1 7C20 7C20 7040 70 50 7060 7070 7080 

*CTICEVNYINIK**KL*H*E*HP 
ECOLEK*II*I*SSK\'*TIRSSTH 

RPMHRSELYKYKVVK ! E PLGVAPT 
AACGACAATTCGAGAAGTGAATTATATAAATATAAAGTAGTAAAAATTGAACC ATT AG G ACT AGC ACCC A 
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*ELCSLGSUE0 3EALKAHCQ*ft*R 
RSFVPWVLGSSRKHYGRTVNDAOG 
GALFLGFLGAAGSTrtGA RSlTLTY 
AGGAGCTTTGTTCCTTGGGTTCTTGGGAGCAGCAGGAAGC ACTATGGGCGC ACGGTCAATGACGC TCACCG 
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t!-*GLLRRNS[CCNSOSGASSSSRO 
AFGY*GATASVATHSLGHOAAPGK 
LRAIEAOQHLLQUTVWG i K 0 L 0 A R r 
".CTCAGGGCTATTGAGGCGCAAC AGCATCTGTTGCA ACTCACAGTCTGGGCCA TCAACC AGCTCC ACCC A A 
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GAATCCTGGCTGTGGAAACA T ACC T AA AG G ATC A AC A CCTCC TCG GG A TT T GGGG T T GC TCTGGA AAACTCAT1 
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TGGAACAGATTTGGAATAACATGACCTGGATCGAGTGGCACAGACAAATTAACAATT ACACAAGCTTAATACAT 
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II«IR*MGKFVELV» m h K L A V V Y K 
LLELOKHASLHNWF fr? I T} N ' W L W Y I K 
AATTATTGGAATT AGATAAATGGGCAAGTT TGTGGAATTGGTTTAACA TAACA AATTGGC TGTGGTATATAAAi 
7690 7700 7710 7720 7730 77*0 7750 ; 

L L Y F L * * I ELGRO IHHYRFRPTSQP 
CCTFYS£*S*AGIFTI I V S D P P . P N 
AVLS I V/NR VROGY S PLS F OT HLP T 
TT GCTGT ACTTTCT AT AGTG AATAGAG TTAGG CAGGGA TATTCACC ATTATCGTTTC AGACCC ACCTCCC AACC 
7810 7820 7830 78*0 7850 7860 7870 

RETETDPFD**TQP*HL5CT I c g a l 
-ERU.R QIHSIS g P L LSTYLGRSAEP 

RDRDRSIRLV LN G si L A L I WODLRSL 
AGAGAGAC AG AG AC AC ATCC ATTCG AT T AG TG A AC G G A TC C TTAGC AC TT A TCTG GGAC G ATC TGCGG AGCCT7 
7930 7940 7950 7960 7970 7930 7990 

TRIVELLGfcRGWEALKYWWMLLQY* 
RGLWNFMDAGGGKPSNIGCI SYSI 
EOCGTSGTQGVGSPQILVESP'T'VL 
ACGAGGATTGTGGAACTTCTGGGACGC AGGGGGT GGG A AGCCCTC A A A T A T T GGTGG A AT CTCC TACAGTATTG 
8050 8060 8070 8030 8090 8100 3110 


AIAVAECTORV IEVVOCACR'AIRHI 
?*D*LRGOlGL*K*YKELVELFAT 
HSS5*GDR*GYKSSTRSL»SYSPH 
GCCATAGC AGTAGCTGAGGGGAC4GAT AGGGTT AT ACAAGTAGT ACA AGGAGCTTGTAGAGCT ATTCGCCAC AT 
8170 8180 8190 3200 3210 3220 8230 

GWOVVKK + CGMJIAYCKCKNETS + AS 
GGKWSKSS VVG VJPTVR E R M R R A E P 
VASCOKVVWLPGLL*GKE^0ELS0 
GGGTGGCA ACTCGTCAAAAAGTAGTGTGGTTGGATGGCCT ACTGT AAGCCAA AG AATGAGACG AGCTGAGCC AC 
8290 8300 8310 0320 3330 3340 8350 

snhk*oyssyocclclar strgggc 
aitssutaatnaacawlfaoeeee 
osovai u3lp»llvpg + khkr r » 


acc&atcacaagtaccaatacagcacct accaatgctgc ttgtgcctggct(ag AAGC ACfr a gagga ggaggagc 
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:TCCAAAACTCATTTGCACC ACTGCTGTGCCTTGC^ATCC TAGTTCGAGT AATAAATCTC 
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CAAGCTTAATACATTCCTTAATTGAAGAATCGCAAAACCAGCAAGAAAAGAATCAACAAG 
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TGTGGTATATAAAAATATTCATAATGATAGTAGG AGGCTTGGTAGGTTTA AG AATAGTTT 
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C CC ACCTCCC AACCCCG AGGGG ACCCG AC A GGCCCC AAGGA ATAG AAG AAGAAGGTGGAC 
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I^C GAUCLFSYHRLROLLLI V 
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LRSLVP.LOL PPLERLTLOCN 
/^TCTGC GG AGCCTTGTGCCTCTT C AGCT ACC ACCGCTTGAGAGACTTACTCTTG ATTGTA 
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LLQYWSOEL KNSAVSLLNAT 
SYSIGVRN + RIVLLACSMPO 
-PTVLESG TKE*CC*LAOCHS 
TXCTACAGTATTGGAGTCAGGAACTAAAGAATAGTGCTCTTAGCTTGCTCAATGCCACA 
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ctattcgccacatacctagaagaataagacaggccttggaaagcattttgctataagat 
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rs*ASSR*CGSSI5RPGKTW 
RAEPAAOGVGAASROLEKHG 
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CGAGCTGAGCCAGCAGCAG ATGGGGTGGG AC C AGC ATCTCG AG ACCTCGAAAA AC AT CG 
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AAGCTTGCCT TGAGTGCTTC AAGTAGTGTG TGCCCGTCTG TTCTGTGACT CTGGTAACTA 

70 80 *0 100 110 120 

CAGATCCCTC AGACCCTTTT AGTCAGTGTG GAAAATCTCT AGCA6TGGCG CCCGAACACG 

130 140 150 160 170 180 

GACTTGAAAG CGAAAGGGAA ACCAG AGGAG CTCTCTCCAC GCAGGACTCG GCTTGCTGAA 

190 200 210 220 230 2*4 

GCGCGCACGG CAAGAGGCGA GGGGAGGCGA CTGGTGAGTA CGCCAAAAAt TTTGACTAGC 


250 
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2 70 

GGAGGCT AGA 

AGGAGAGACA 

TGGGTGCGAG 

310 

320 

330 

TCGATGGGAA 

AAAATTCGGT 

TAAGGCCAGG 

370 

380 
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AGTATGGGCA 

AGCACGGAGC 

TAGAACGATT 

430 

440 

450 

AGAAGGCTGT 

AGACAAATAC 

TGGGAC AGCT 

490 

500 

510 

ACTTAGATCA 

TTATATAATA 

CAGTAGCAAC 

550 

560 

570 

AAAAGACACC 

AAGGAAGCTT 

TAGACAAGAT 

610 

620 

630 

AGCACAGCAA 

GCAGCAGCTG 

ACACAGGACA 

670 

680 

690 

AGTGCAGAAC 

ATCCAGGGGC 

AAATGGTACA 

730 

740 

750 

ATGGGTAAAA 

GTAGTAC AAG 

AGAAGGCTTT 

790 

800 

RIO 

ATT ATCAGAA 

GGAGCCACCC 

CACAAGATTT 

850 

860 

870 

TCAAGCAGCC 

ATGCAAATGT 

TAAAAGAGAC 

910 
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AGTGCATCCA 

CTGCATGCAG 

GGCCT ATTGC 

970 
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TGACATAGCA 

GGAACTACTA 

GTACCCTTCA 

1030 

1040 

1050 

ACCTATCCCA 

GTAGGAGAAA 

TTTATAAAAG 

1090 

1100 

1110 


280 

290 

300 

AGCGTCAGTA 

TTAAGCGGGG 

GAGAATTAGA 

340 

350 

3*0 

GGGAAAGAAA 

AAATATAAAT 

TAAAACATAT 

400 

410 

420 

CGCTGTTAAT 

CCTGGCCTCT 

TAGAAACATC 

460 
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ACAACCATCC 

CTTCAGACAG 

GATCAGAACA 

520 

530 

540 

CCTCTATTGT 

GTGCATCAAA 

GGATAGAGAT 

580 
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600 

AGAGGAAGAG 

CAAAACAAAA 

GTAAG AAAAA 

640 

650 
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CAGCAGCCAG 

GTCAGCCAAA 

ATTACCCTAT 

700 
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TCAGGCCATA 

TCACCTAGAA 

CTTTAAATGC 
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CAGCCC AGAA 
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GTGATACCCA 

TGTTTTCAGC 
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AAACACCATG 

CTAAACACAG 

TGCGGGGACA 
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CATCAATGAG 

GAAGCTGCAG 

AATGGCATAG 

940 
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ACCAGGCCAG 

ATCAGAGAAC 

CAAGGGGAAG 

1000 
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GGAACAAATA 

GCATGGATCA 

CAAATAATCC 
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1070 

1080 

ATGGATAATC 

CTGGGATTAA 

ATAAAATAGT 

1120 
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AAGAATGT AT AGCCCTACCA GCATTCTCCA 
1150 1160 1170 

AC actatcta caccggttc* ataaaactct 

1210 1220 1230 

AAATTCCATC ACACAAACCT TGTTGGTCCA 

1270 1280 1290 

AAAACCATTC COACCAGCAC CTACACTAG-A 

1330 1340 1390 

AGGACCCGGC CATAAGGCAA GAGTTTTGGC 

1390 1400 1410 

TACC ATAATG ATGCAAAGAG gcaattttag 

1490 1460 1470 

TTGTGGCAAA GAAGGGCACA TAGCCAGAAA 

1510 1520 1530 

GAAATGTGGA AAGGAAGGAC ACCAAATGAA 

1570 1580 1590 

AGGGAAGATC TGGCCTTCCT ACAAGGGAAG 

1630 1640 1650 

GCCAACAGCC CCACCAGAAG AGAGCTTCAG 

1690 1700 1710 

GAAGCAGGAG CCCAT AGACA AGGAACTGTA 

1750 1760 1770 

CAACGACCCC TCGTCACAAT AAAGATAGGG 

1310 1620 1830 

GGAGCAGATG ATACAGTATT AGAAGAAATG 

1870 1880 1690 

ATAGGGGGAA TTGGAGGTTT TATCAAAGTA 

1930 1940 1950 

TGTGGACATA AAGCT AT AGG TACAGTATTA 

1990 2000 2010 

AGAAATCTGT TGACTCAGAT TGGTTGCACT 

2050 2060 2070 

CTACCAGTAA AATT AAAGCC AGGAATGGAT 

2110 2120 2130 

GAAGAAAAAA TAAAAGCATT AGTAGAAATT 

2170 2180 2190 

TCAAAAATTG GGCCTCAAAA TCCATACAAT 

2230 2240 2250 

AGTACTAAAT GCAGAAAATT AGTAGATTTC 
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AAWWV III Au 
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1190 
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CAAGCTTCAC 
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^ V W *• w w ■ M MM 

12*0 
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1260 

AAA 1 VWvAAW 

CC ACATTGT A 

wW AW A I 1 V 1 A 

AC Af T ATTTT 
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1310 

1320 

ACAAATCATC 

A W A A A 1 v A 1 II 

AC ACCATCTC 

AW A WW A 1 V# IW 

ACCCACTCCC 
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TAAAT.TA ATG 

act f aa^taa 

Av W W A AW 1 AA 
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1500 
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GAGAGACAGC 
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WM WM Ww M V A 
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CTCCCTCTC A 
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1720 
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Tf CTTTAACT 
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TTCCTCACAT 
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CACTCTTTGG 
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1780 
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GGGCA ACT AA 
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ACGAAGCTCT 

ATTAGATAC A 

1840 
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AGTTTGCCAG 

GA AC ATCG AA 

WM M V M t W M M 

ACCAAAAATG 

MVWMMMM** 1 w 
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ACACACTATfi 
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ATCAGATACT 

Ml W M V M t M W f 

CAT AG AAATC 

I960 
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C ATAATTGGA 
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2030 
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Cf ATTACTCC 
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GGCCC AAAAG 

TTAAACA ATG 

1 1 MMM^MMl W 

GCC ATTGACA 

21A0 

2150 

2160 

TCTACACAAA 

1 \* 1 AW AW A A A 

TCC A AAAT.f.A 

1 WWAAAAwVA 

AGGGA AA ATT 

2200 

2210 

2220 

ACTCC AGTAT 

TTGCCAT AAA 

GAAAAAAGAC 

2260 

2270 

2280 

AGAGAACTTA 

ATAAGAGAAC 

TCAAGACTTC 


2290 2300 2310 2320 2330 2340 

TGGGAAGTTC AATTAGGAAT ACCACATCCC GCAGCGTTAA AAAAGAAAAA ATCAGTAACA 


2*10 2420 2*30 2440 2450 2460 

ACTCC ATTTA CCATACCTAG TaTAAACAAT CAGACACCAG CCATTACATA TCACTACAAT 


P 47ft 

2*50 

2*90 

GTCCTTCCAC 

AGGGATGCAA 

AGGATCACCA 

2 530 


2550 

TTAGACCCTT 

TT AGA AAAC A 

AAATCC AGAC 

C v 7 V 


2610 

TATGTAGGAT 

CTGACTTAGA 

AAT AGGGCAG 

C u w 

2660 

2670 

CATCTGTTGA 

GGTGGGGACT 

TACCACACCA 

2710 

2720 

2730 

CTTTCGATGG 

GTTATGAACT 

CCATCCTGAT 

2 770 

2780 

2790 

GAAAAAGACA 

CCTGG ACTGT 

CAATGACATA 

2830 

28*0 

2850 

AGTCAGATTT 

ACCCAGCGAT 

TAAACTAACG 

2 890 

2900 

2910 

GC ACT AACAG 

AAGTAATACC 

ACT AAC AGAA 

2990 

2960 

2970 

GAGATTCTAA 

AACAACC ACT 

ACATGGAGTC 

3010 

3020 

3030 

GA AATACAGA 

AGCAGGGGCA 

AGGCCAATGG 

3070 

3080 

3090 

AATCTGA AAA 

CAGCAAAATA 

TGCAAGAACG 

3130 

3 140 

3150 

TTAACAGAGG 

CAGTGCA AAA 

AATAACCACA 

3190 

3200 

3210 

AAATTT AAAC 

TACCCATACA 

AAAGG A AAC A 

3250 

3260 

3270 

GCCACCTGCA 

TTCCTGAGTG 

GGAGTTTGTC 

3310 

3320 

3330 

CAGTTAGAGA 

AACAACCCAT 

AGTAGGAGCA 

3370 

3350 

3390 

AGGGAGACTA 

AATTAGCAAA 

ACCAGGATAT 

3430 

3**0 

3*50 

ACCCTAACTC 

ACACAACAAA 

TCAGAAGACT 

3*90 

3500 

3510 

GATTCCCGAT 

TAGAAGTAAA 

TATAGT AACA 

3550 

3560 

3570 

GCACAACCAG 

AT A AAAGTGA 

ATCAGAGTTA 

3610 

3620 

3630 
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2510 

2520 

GCAAT ATTCC 

AAAGTAGC AT 

GACAAAAATC 

2560 

2570 

2580 

ATAGTTATCT 

ATCAAT ACAT 

GGATCATTTG 

2620 

2630 

2640 

CATAGAACAA 

AAATAGAGGA 

GCTGAGACAA 

2680 

2690 

2700 

GACAAAAAAC 

ATCAGAAAGA 

ACCTCCATTC 

27*0 

2750 

2760 

AAATCGACAG 

TACAGCCT AT 

AGTCCTGCCA 

2800 

2810 

2820 

CAGAAGTTAG 

TGGGAAAATT 

GAATTGGGCA 

2860 

2870 

2880 

CAATTATGTA 

AACTCCTTAG 

AGGAACC AAA 

2920 

2930 

2940 

GAAGC AGAGC 

TAGAACTGGC 

AGAAAAC AGA 

2980 

2990 

3000 

TATTATGACC 

CATCAAAAGA 

CTTAATACCA 

3040 

3050 

3060 

ACATATCAAA 

TTTATCAAGA 

GCC ATTTAAA 

3100 

3110 

3120 

ACGGGTGCCC 

ACACTAATGA 

TGTAAAACAA 

3160 

3170 

3 180 

GAAAGCATAG 

TAATATGGGG 

AAAGACTCCT 

3220 

3230 

3240 

TGGCAAACAT 

GGTCGACAGA 

GTATTGGCAA 

3280 

3290 

3300 

AATACCCCTC 

CTTTACTGAA 

ATTATCGTAC 

33*0 

3350 

3360 

CAAACGTTCT 

ATGTAGATGG 

GGCAGCT AGC 

3*00 

3410 

3420 

CTTACTAATA 

GAGGAACACA 

AAAAGTTGTC 

3*60 

3470 

3480 

GAGTTACAAG 

CAATTCATCT 

ACCTTTCCAG 

3520 

3530 

3540 

CACTC ACAAT 

ATCCATTAGG 

AATCATTCAA 

3580 

3590 

3600 

GTCAATCAAA 

TAATAGAGCA 

CTTAATAAAA 

36*0 

3650 

3660 


3730 3740 3750 3760 3770 3780 

GCCCAAGATG AACATGAGAA ATATCACACT AATTCCACAG CAATCGCTAG TGATTTTAAC 

3790 3800 3810 3320 3R30 3840 

CTGCCACCTG TAGTAGC AAA AGAA AT AGTA GCCAGCTGTC ATAAATGTCA GCTAAAAGGA 


i n s n 

J07U 

jOCv 

3870 

1 AAA 

1 AQft 

3 900 

GAAGCCATCC 

ATGGACAAGT 

AGACTGTAGT 

CCAGGAATAT 

GGCA ACT AGA 

TT6TACAC AT 

1 9 1 o 

3920 

3930 

3940 

3Q4G 


TTAGAAGGAA 

AAGTT ATCCT 

GGTAGCAGTT 

CATGTAGCC A 

GTGGATATAT 

ACAAGCAGAA 

3970 

3980 

3990 

4ftftA 

4010 

4020 

CTTATTCC AC 

CAGAAAC AGG 

GCAGGAAACA 

CCATACTTTC 

TTTTAAAATT 

AGCAGGAAGA 

~ W J W 

4040 

4050 

t W J w 

40 Aft 

4070 


TCGCC ACT AA 

AAACAATACA 

TACAGACAAT 

GGCAGCAATT 

TCACCAGTAC 

TACGGTT AAG 

AAQA 
"U7y 

4 100 

4 110 



^ i4\J 

CCCGCCTGTT 

GGTGGGCGGG 

AATCA AGCAG 

GAATTTGGAA 

TTCCCTACAA 

TCCCC AAAGT 

4150 

4160 

4170 

4 ISO 

~ A W W 

4 190 

4 700 

CAACCACTAC 

TAGAATCTAT 

GAATAAAGAA 

TTAAAG AAAA 

TTATAGGCCA 

GGTAAGAGAT 

4210 

4220 

4230 

4240 

4250 

~ w w 

CACGCTCAAC 

ATCTTAAGAC 

AGCAGTACAA 

ATGGC AGTAT 

TCATCCACAA 

TTTTAAAAGA 

4270 

4280 

4290 

4300 

4310 

4320 

AAACGGCGGA 

TTGGGGGGTA 

CAGTCCAGGG 

GAAAGA ATAG 

TAGACATAAT 

AGCAACAGAC 

4330 

4340 

4350 

4360 

T J W W 

4370 

4380 

ATACAAACTA 

AAGAATTACA 

AAAACAAATT 

ACAAAAATTC 

AAAATTTTCG 

GGTTTATTAC 

4390 

4400 

4410 

4420 

4430 

4440 

AGCGAC AGCA 

GAGATCCACT 

TTGGAAAGGA 

CCAGCAAAGC 

TCCTCTGGAA 

AGGTGAAGGG 

4450 

4460 

4470 

4480 

4490 

4400 

~ w w w 

CCAGTAGTAA 

TACAAGATAA 

TAGTGACATA 

AAAGT AGTGC 

CA AG AAG AAA 

AGCAAAGATC 

4510 

4520 

4530 

4540 

4550 

4560 

ATTAGCGATT 

ATGGAAAACA 

GATGGCAGGT 

GATGATTGTG 

TGGCAAGTAG 

ACAGGATGAG 

4570 

4580 

4590 

4600 

4610 

4620 

GATTAGAACA 

TGCAAAAGTT 

TAGTAAAACA 

CCATATGTAT 

GTTTCAGGGA 

AAGCT AGGGG 

4630 

4640 

4650 

4660 

4670 

4680 

ATCGTTTTAT 

AGACAT.CACT 

ATCAAAGCCC 

TCATCC AAGA 

ATAAGTTCAG 

AAGTACACAT 

4690 

4700 

4710 

4720 

4730 

4740 

~ f ~ w 

CCC ACT AGGG 

GATGCTAGAT 

TGCTAATAAC 

AACATATTGG 

GGTCTGCATA 

CAGGAGAAAG 

4750 

4760 

4770 

4780 

4790 

4800 

AGACTGGC AT 

CTGGGTC AGG 

GAGTCTCCAT 

AGAATGGAGG 

AAAAAGAGAT 

ATAGCACACA 

4810 

4820 

4830 

4840 

4 850 

4860 

AGTACACCCT 

CAACTAGCAG 

ACCAACTAAT 

TCATCTGTAT 

TACTTTGACT 

GTTTTTCAGA 

4670 

4880 

4890 

4900 

4910 

4920 


■^^■■■■■WWW^ffW^WAT^: .IAUJIA CFIGCCACTA OC AGC ATTAA T AACACC AAA 

4990 5000 5010 5020 5030 5040 

~ AAAGATAAAG CCACCTTTCC CTACTCTTAC GAAACTCACA CACCATAGAT G6AACAAGCC 

.J* 5050 5060 5070 50H0 5090 5100 

/'t, CC AGAAGACC AAGGGCC ACA GAGGGAGCC A CACAATCAAT GG ACACTAGA GCTTTTAGAG 

* 5110 5120 5130 51*0 5150 5160 

GAGCTTAAGA ATGAAGCTGT TAGACATTTT CCT AGGATTT GGCTCCATGG CTTAGGGCA A 

5170 5180 5190 5200 5210 5220 

CATATCTATG AAACTTATGG GGATACTTCG GCAGGAGTGG AAGCCATAAT AAGAATTCTG 

5230 52*0 5250 5260 5270 5280 

CAACA ACTGC TGTTTATCC A TTTCAG AATT GGGTGTCGAC ATAGCAGAAT AGGCGTTACT 

- * 5290 5300 5310 5320 5330 53*0 

CAACAGAGGA GAGCA AGAAA TGGAGCCAGT AGATCCTAGA CT AGAGCCCT GGAAGCATCC 

5350 5360 5370 5380 5390 5*00 

AGGAAGTCAG CCTAAAACTG CTTGTACCAC TTGCTATTGT AAAAAGTGTT GCTTTCATTG 

5*10 5*20 5*30 5**0 5*50 5*60 

CC AAGTTTGT TTC AC AACAA AAGCCTTAGG CATCTCCTAT GGCAGGAAGA AGCGCAGACA 

9 5*70 5*80 5*90 5500 5510 5520 

GCGACGAAGA CCTCCTCAAG GCAGTC AGAC TCATCAAGTT TCTCTATCAA AGCAGTAAGT 

5530 55*0 ' 5550 5560 5570 5580 

AGTACATGTA ATGCA ACCT A TACAAATACC AATAGCAGC A TTAGTAGTAG CAATAATAAT 

5590 5600 5610 5620 5630 56*0 

AGCAATAGTT GTGTGGTCCA TAGTAATCAT AGAATATAGG AAAATATTAA GACAAAGAAA 

5650 5660 5670 5680 5690 5700 

AATAGACAGG TTAATTGATA GACTAATAGA AAGAGCAGAA GACAGTGGCA ATGAGAGTGA 

5710 5720 5730 57*0 5750 5760 

AGGAGAAATA TCAGC ACTTG TGGAGATGGG GGTGGAAATG GGCCACC ATG CTCCTTGGGA 

5770 5780 5790 5800 5810 5820 

TATTGATG AT CTGTAGTGCT AC AGA AAAAT TGTGGGTCAC AGTCTATTAT GGGGTACCTG 

#5830 58*0 5850 5860 5870 5880 

TGTGGAAGGA AGCAACCACC ACTCT ATTTT GTGCATC AGA TGCT AAAGCA TATGATACAG 

5890 5900 5910 5920 5930 59*0 

AGGTACATAA TGTTTGGGCC ACACATGCCT GTGTACCCAC AGACCCCAAC CCACAAGAAG 

5950 5960 5970 5980 5990 6000 

TAGTATTGGT AAATG'TGAC A GAAAATTTTA ACATGTGGAA AAATGACATG GTAGAACAGA 

6010 6020 6030 60*0 6050 " 6060 

TGCATGAGGA TATAATCAGT TTATGGGATC AAAGCCTAAA GCCATGTGTA AAATTAACCC 

6070 6080 6090 6100 6110 6120 

CACTCTGTGT TAGTTTAAAG TGCACTGATT TGGCCAATGC TACT AATACC AATAGTAGTA 

6130 61*0 6150 6160 6170 t 6160 



rr a at att Aft 

tap a Ar.rnT a 

AC ACGTAAGC 

T GP AG A A AG A 
• \i \* m v m /* ** v ** 

AT ATGP A TT T 
AfAIWWAI I 1 

TTTTA Til a f* 
1 1 l 1 A 1 AAA^ 

6250 

. 6260 

6270 

6280 

6290 

6300 

TTflATATAAT 

1 IWMVAIAm* 

ACT A AT AC AT 

AATGATAPTA 
A A 1 W A I AW * A 

CC AGCT AT AC 

W W A V W 1 A| AW 

GTTGAP A ACT 

Wl IvAWAAWl 

TftT A AP A PPT 
• W I AA w A WW 1 

6310 

6320 

6330 

6340 

6350 

6360 

f AfiTCATTAC 

W A W 1 W A 1 1 

AC A GCPC T(1T 

A V A W W W W 1 W 1 

PP A AAGGT AT 

W W A A A W W 1 A I 

CCTTTCAGCP 

WW III WAWWW 

AATTPCP AT A 

AA I 1WWWAIA 

P ATT ATTPTft 
W A I 1 A 1 Iwlw 

6370 

6390 

6390 

6400 

6410 

6420 

pcppccctgg 

Uwwwviuw i w w 

TT T TGCG ATT 

1 1 1 1 V WV At! 

CT AAA ATGT A 

AT A AT A Aft AP 

Al AA 1 A AW AW 

ftTTP A ATftftA 
vl IWAAIWWA 

KC Aftft APP AT 
AU A WW Aw W A 1 

6430 

64 40 

6450 

6460 

6470 

6480 

CT AT A AATCT 

V 1 A W A AM I Vl| 

r AfiTACACTA 

W H W \* M W M V 1 A 

C A ATGT AC AP 

W A A 1 V 1 m\« 

ATGGAATTAG 

A 1 V V A A 1 | Av 

GPP Aft TACT A 
Www AW 1 Awl A 

TP A AP TP A AP 
• W A AW 1 W A Aw 

6490 

6500 

6510 

6520 

6530 

6540 

TGCTGTTGAA 

1 V w 1 V ■ 1 V M^* 

TGGCAGTCT A 

GCAGAAGA AG 

WWAWAAWAAW 

AGGTAGT A A T 

A W W 1 A V 1 AA 1 

TAG ATPTGCP 
I A v A 1 W 1 WW w 

A A T TT P A P A r 
AA III WA w A W 

6550 

6560 

6570 

6580 

6590 

6600 

AfA a tgct aa 

M W A A t W W 1 MA 

AACCATAAT A 

A A W W A 1 MA I A 

GT APA ftPTGA 

AP p A ATPTGT 

A V W A A 1 V 1 W 1 

Aft A A ATT A AT 
AW AAA 1 1 AA 1 

TftT AP A AP AP 
1 W I Aw AAwAw 

6610 

6620 

6630 

6640 

6650 

6660 

CC AACAACA A 

T AC A ACAAAA 

ACT ATCCGTA 

M V 1 M 1 W W V 1 M 

T PP Aft AGGGG 
1 WWAWAWWWW 

APP Aftft G AC A 
Aw W AvW W AW A 

C4* ATTTftTTA 
WW A | 1 I (# I | A 

6670 

6680 

6690 

6700 

6710 

6720 

P A AT AGG A A A 

W A A 1 M V V M M M 

A AT ACCA A A T 

ATG Aft AP A Aft 

P AC ATTPT A A 
V Aw A 1 1 v 1 A A 

P A TT ACT Aft A 
W A I 1 Aw I AwA 

iZf AAA iTr P A 

ww A AAA 1 ww A 

6730 

67*0 

6750 

6760 

6770 

6780 

ATGCCACTTT 

mi V \# w *• W III 

AA AAC AG ATA 

m m m m\* m\J m 1 A 

GCT ACC AAAT 

W W 1 A V W M A A 1 

T A Aft AG A AP A 
1 AAwAwAAwA 

A T TT ft ft A A A T 
A 1 1 1 ww AAA I 

AATAAAAPA A 
AA I AAAAwAA 

6790 

6800 

6810 

6820 

6830 

68 40 

TAATCTTTAA 

1 M M I W 8 1 i mm 

CC A ATCCTPA 

V W M A 1 W W V W M 

GGAGGGGAPP 

WWAWWWWAWW 

P Aft A A ATTft T 
W Aw A A A 1 | v 1 

A A PGP AP AftT 
AAwWWAWAW I 

TTT AATTftTr 
111 AA 1 Iwlw 

6850 

6860 

6870 

6880 

6890 

6900 

GAGGGGAATT 

V M V W V #^ 1 ■ 

TTTCTACTGT 

4 1 1 W ■ P*V 1 W • 

AATTC A APAC 

A APTCTTTAA 
A A W 1 Ul 1 1 A A 

TAftT ACTTftft 
I A w 1 AW 1 1 W v 

TTTAATA^TA 
III AA 1 AO f A 

6910 

6920 

6930 

6940 

6950 

6960 

CTTGGACTAC 

TC A AGGGTC A 

1 v m •* v v V 1 V A 

A AT AAC ACTft 

AMI MM V A V 1 V 

A Aftft A AGTft A 
A AW v A AW I W A 

PAPA ATP AP A 
WAwAA 1 W AW A 

PTPPPATftPA 
w 1 www A 1 WW A 

6970 

6980 

6990 

7000 

7010 

7020 

GA A TA AA ACA 

ATT TATAAAC 

ATGTCGCAGG 

Ml V I W W W A W W 

A ACT A CC AAA 

A A V 1 A W V A A A 

AGP A ATGT AT 
A w W A A 1 w 1 A 1 

Wwwww 1 WWwA 

7030 

7040 

7050 

7060 

7070 

7080 

TC ACCGC AP A 

1 WAWWWWAWA 

A AT T Aft AT/IT 

AMI 1 A W A V W 1 

TP A TP A A A T A 

TT AP A ft P r P T 
1 1 AwAuvvw I 

PPTATTAAPA 
ww 1 At 1 AACA 

AUAWA 1 ww 1 w 

7090 

7100 

7110 

7120 

7130 

7140 

CTAATAACAA 

VI M M 1 M M W A A 

C A ATGGGTCC 

YrfMM| V WW 1 WW 

G AGATCTTP A 

V A V A • W 1 1 V A 

GAPP TGft Aftft 
WAWW 1 WWAWW 

Aftft Aft AT ATP 
AW W A W A 1 A I W 

Aft ft ft A P A ATT 
AvUWAwAAl | 

7X50 

7160 

7170 

7180 

7190 

7200 

GC AGAAGTCA 

ATT AT AT AAA 

Ml IMtAIAAA 

T AT AA ACT AG 

• A I M A A V 1 A V 

T A A A A ATTG A 
1 AAAAA 1 I WA 

APP ATT A PC A 
Av w A 1 1 A WW A 

ft T AftP APPP A 
w 1 AwWAwwwA 

7210 

7220 

7230 

72*0 

7250 

7260 

CC A AGCC A AA 

G A ft A A ftAGTC 

WM VMM WAV ■ V 

ft T GC Aft AG AG 
v | WWAWMWAW 

AAAAAAGAGC 
AAAAAAwAWW 

Aft Tftftft A ATA 
Aw 1 WWW A A I A 

ftp AftP TTTft T 
UwAww 1 1 (Vl 

7270 

7280 

7290 

7300 

7310 

7320 

TCCTTCCCTT 

CTTGGGAGCA 

CCAGG A AGC A 

V V* M W A A W W M 

C TATGGGCGC 

W 1 Al v V W WW W 

APGGTPA ATft 
Aw WWIWAAlw 

Afft/*Tft A PftP 
Aw ww 1 wAwwW 

7330 

7340 

7350 

7360 

7370 

7380 

TACAGGCC AG 

ACA ATTATTG 

#4 V #* M 1 1 M 1 IV 

TCTCCTATAft 

1 V 1 V W 1 A 1 A v 

TftP AftP AftP A 
1 WwAwwAvwA 

GAACAATTTG^CTOAGCCCTA 

7390 

7400 

7410 

7420 

7430 

7440 


rr 


* 


C4ATCCTCCC TGTGGAAAGA TACCfAAAGG ATCAACAGCT CCTCGGGATT TGGGGTTGCT 

7510 7520 7530 7540 7550 7560 

CTGGAAAACT catttgcacc actcctgtgc cttcgaatgc tacttggagt aataaatctc 

7570 7580 7590 7600 7610 7620 

tcgaacagat ttggaataac atgacctgga tggagtggga cagagaaatt aacaattaca 

7630 7640 7650 7660 7670 7680 

caagcttaat acattcctta attgaagaat cgcaaaacca gcaagaaaag aatgaacaag 

7690 7700 7710 7720 7730 7740 

AATTATTGGA attagataaa tgggcaagtt tgtggaattg gtttaacata acaaattgcc 

7750 7760 7770 7780 7790 7800 

TGTGGTATAT AAAAATATTC ataatgatag tagcacgctt ggtaggttta agaatacttt 

7810 7820 7830 7840 7850 7860 

TTGCTGTACT TTCTATAGTC AATAGAGTTA GGCAGGGATA TTCACCATTA TCGTTTCAGA 

7870 7880 7890 7900 7910 7920 

CCCACCTCCC AACCCCGACG GGACCCGACA GGCCCGAAGG AATAGAAGAA GAAGGTGCAG 

7930 7940 7950 ' 7960 7970 7980 

AGAGAGAC AG AGACAGATCC ATTCGATTAG TGAACGGATC CTTAGCACTT ATCTGGGACG 


7V90 8000 8010 8020 8030 8040 

ATCTGCGGAG CCTTGTGCCT CTTCAGCT AC CACCGCTTGA GAGACTTACT CTTGATTGTA 

8050 8060 8070 8080 8090 8100 

ACGAGGATTG TGGAACTTCT GGGACGCAGG GGGTGGGAAG CCCTCAAATA TTGGTGGAAT 

8110 8120 8130 8140 8150 8160 

CTCCT ACAGT ATTGGAGTCA GGAACTAAAG AATAGTGCTG TTAGCTTGCT CAATGCCACA 

6170 8180 8190 8200 8210 8220 

GCC ATAGCAG TAGCTGAGGG GACAGATAGG G TTATAGAAC TAGTACAAGG ACCTTGTACA 

8230 8240 8250 8260 8270 8280 

GCT ATTCGCC ACATACCTAG AAGAATAAGA CAGGCCTTGG AAAGGATTTT GCTATAAGAT 


8290 8300 8310 8320 8330 8340 

GGGTCGCAAG TGGTC AAAAA GTAGTGTGGT TCCATGGCCT ACTGTAAGGG AAAGA ATGAG 


r. 


8350 8360 8370 8380 8390 8400 

ACGAGCTGAG CC AGCAGC AG ATGGGGTGGG AGCAGC ATCT CGAGACCTGG AAAAACATGG 

8410 8420 8430 8440 8450 8460 

AGCAATCACA AGT AGCA ATA CAGCAGCTAC CAATGCTGCT TGTGCCTCGC TAGAAGCACA 

8470 8480 8490 8500 8510 8520 

' n AGACGAGCAG GAGGTGCCTT TTCC AGTCAC ACCTCAGGTA CCTTTAAGAC CAATGACTTA 

8530 8540 8550 8560 8570 8580 

CAAGGCAGCT GTAGATCTTA GCCACTTTTT AAAAGAAAAG GGGGGACT6G AAGGGCTAAT 

8590 8600 8610 8620 8630 8640 

TCACTCCCAA CGAAGACAAG ATATCCTTGA TCTGTGGATC TACCACACAC AACGCTACTT 

8650 8660 8670 8680 8690 8700 


L |y** 8710 3720 3730 ti?<.0 3750 8760 

t' CTCCTACAAC CTAGTACCAG riGACCCAGA TAAGGTAGAA GAGGCC AATA AAGGAGAGAA 

8770 8780 3790 8800 8810 8820 

CACCAGCTTG TTACACCCTG TGAGCCTGCA TGGAATGGAT GACCCTGAGA GAGAAGTCTT 

8810 8840 8850 8860 8870 8880 

AG AGTGGAGG TTTGACAGCC GCCT AGCATT TCATCACGTG GCCCGAGACC TGCATCCGGA 

8890 8900 8910 8920 8930 8940 

GTACTTCA AG AACTGCTGAC ATCGAGCTTG CTACAAGGGA CTTTCCGCTG GGGACTTTCC 

8950 8960 8970 8980 8990 * 9000 

t AGGGAGGCGT GGCCTGGGCG GAACTGGGGA GTGGCGAGCC CTCAGATGCT GCATATAACC 

4 9010 9020 9030 90*0 9050 9060 

- ' AGCTGCTTTT TGCCTCTACT GGGTCTCTCT GGTTAGACCA GATTTGAGCC TGGGAGCTCT 


9070 9080 9090 9100 

CTGGCT AACT AGGGAACCCA CTGCTTAAGC CTCAATAAAG CTT 


O 


